skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Hsiao, Sharon"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Scientists and policymakers are increasingly leveraging complex, multi-scale data from diverse, worldwide sources to understand the causes and consequences of economic development, social stratification, climate change, cultural diversity, and violent conflict. This work frequently requires integrating data across diverse datasets by complex, dynamic categories (e.g., ethnicities, languages, religions, subdistricts). However, different datasets encode corresponding categories in disparate formats and at different resolutions (e.g., Guatemala Indigenous vs. Maya vs. K’iche’). These diverse encodings must be translated across datasets before bringing them together for analysis. At global scales across thousands of categories, the combinatorial complexity creates thorny challenges for manual reconciliation and for transparent documentation and sharing of researcher decisions. There is a need to investigate direct and uncomplicated ways to support search and explore the semantics for complex and diverse datasets.We design and deploy such a tool, CatMapper, to support semantic discovery through exploration and manipulation for large, complex and diverse datasets. CatMapper enables exploring contextual information about specific categories, translating new sets of categories from existing datasets and published studies, identify and integrating novel combinations of datasets for researchers’ custom needs, including automatically generated syntax to merge datasets of interest, and publishing and sharing merging templates for public re-use and open science. CatMapper does not store observational data. Rather, it is a dynamic, interactive dictionary of keys to help users integrate observational data from diverse external datasets in disparate formats, thereby complementing and leveraging a fast-growing ecology of datasets storing observational data. We have conducted heuristic evaluation on CatMapper usability. Results shed lights on enriching semantic data discovery. 
    more » « less
  2. Educational Data Mining in Computer Science Education (CSEDM) is an interdisciplinary research community that combines discipline-based computing education research (CER) with educational data-mining (EDM)  to advance knowledge in ways that go beyond what either research community could do on its own. The JEDM Special Issue on CSEDM received a total of 12 submissions. Each submission was reviewed by at least three reviewers, who brought expertise from both the EDM and CER communities, as well as one of special issue editors. Ultimately, three papers were accepted, for an acceptance rate of 25%.   These three papers cover a variety of important topics in CSEDM research. Edwards et al. discuss the challenges of collecting, sharing and analyzing programming data, and contribute two high-quality CS datasets. Gitinabard et al. contribute new approaches for analyzing data from pairs of students working on programs together, and show how such data can inform classroom instruction. Finally, Zhang et al. contribute a novel model for predicting students' programming performance based on their past performance. Together, these papers showcase the complexities of data, analytics and modeling in the domain of CS, and contribute to our understanding of how students learn in CS classrooms. 
    more » « less